Skip to content

Conversation

@saqadri
Copy link
Collaborator

@saqadri saqadri commented Oct 27, 2025

Summary

Note: Takes precedence over #589

Fixes the issue where the temporal worker wasn't importing all the modules needed for agent factory. Allows that to be configurable now, so it's easier than ever to declare workflow_tasks and register the module in config.yaml.

Also added the ability to configure activity retry policies, both in code and in config.

Details

  • Temporal-ready agent factory scaffold: The factory template, examples, and documentation now walk through running the agent-factory workflow under Temporal (including spinning up run_worker.py, preloading workflow-task modules, and setting retry policies). This ensures factory-created LLM activities are registered when the worker starts and that the same code runs seamlessly in asyncio or Temporal modes.
  • Configurable workflow-task retries: New top-level workflow_task_retry_policies and workflow_task_modules settings let users preload their own task modules and apply retry overrides (exact names, suffixes, wildcards, or global fallbacks). Temporal workers import the configured modules at startup so activities created outside the core package are available immediately.
  • Deterministic provider error handling: OpenAI, Anthropic, Azure (OpenAI + AI Inference), and Google GenAI completion tasks now surface configuration/authorization failures as explicit non-retryable ApplicationErrors (via the shared helper) while leaving transient errors to retry. This prevents infinite retry loops in Temporal and preserves normal exception semantics in asyncio.
  • Cloud example parity: Added a customer-support style cloud/agent_factory example (plus packaged copy) that demonstrates a custom @workflow_task, the new config hooks, and invocation through @app.async_tool in Temporal workflows.

Changelist

  • Added a reusable WorkflowApplicationError shim and to_application_error() helper so activities can raise Temporal-compatible ApplicationErrors even when the Temporal SDK is not installed (src/mcp_agent/executor/errors.py, tests under tests/executor/test_errors.py).
  • Updated all LLM adapter activities to use to_application_error, introduced _execute_* helpers, and marked provider-specific misconfiguration errors as non-retryable (src/mcp_agent/workflows/llm/augmented_llm_{openai,anthropic,azure,google}.py; tests in tests/workflows/test_llm_provider_errors.py).
  • Extended configuration to include top-level workflow_task_modules and workflow_task_retry_policies, plus a WorkflowTaskRetryPolicy model; documented usage in the Temporal guides and example configs (src/mcp_agent/config.py, docs/advanced/temporal.mdx, docs/mcp-agent-sdk/advanced/durable-agents.mdx, multiple example READMEs/configs).
  • Enhanced MCPApp.workflow_task and _register_global_workflow_tasks to apply retry overrides automatically; worker bootstrap now preloads built-in and user-specified workflow-task modules before building the activity list (src/mcp_agent/app.py, src/mcp_agent/executor/temporal/__init__.py).
  • Refreshed the factory template/CLI scaffolding and cloud example to include main.py, run_worker.py, updated agents.yaml, and README instructions covering both asyncio and Temporal runs (files under src/mcp_agent/data/templates/, src/mcp_agent/cli/commands/init.py, examples/**/agent_factory).
  • Updated basic/server READMEs to emphasise the modern workflow (dependency installation via uv, inspector settings, deployment steps) and removed legacy token example collateral.

Test Plan

  1. CLI Initialization Paths
    • uv run mcp-agent init --template basic my-basic-app
      • cd my-basic-app
      • uv init && uv add "mcp-agent[openai]"
      • uv run main.py (asyncio path)
      • Uncomment server lines in main.py, uv run main.py, then connect with npx @modelcontextprotocol/inspector --transport sse --server-url http://127.0.0.1:8000/sse.
    • uv run mcp-agent init --template server my-server-app
      • Follow README to uv init && uv add "mcp-agent[openai]", uv run main.py.
    • uv run mcp-agent init --template factory my-factory-app
      • cd my-factory-app
      • uv init && uv add "mcp-agent[openai]"
      • Asyncio run: uv run main.py
      • Temporal run: temporal server start-dev, uv run run_worker.py, then in another terminal uv run main.py.
  2. Cloud Agent Factory Example
    • cd examples/cloud/agent_factory
    • cp mcp_agent.secrets.yaml.example mcp_agent.secrets.yaml (populate OpenAI key)
    • uv pip install -r requirements.txt
    • Asyncio path: uv run main.py
    • Temporal path: temporal server start-dev, uv run run_worker.py, then uv run main.py
    • Optional: connect via Inspector while Temporal worker is running.
  3. Temporal Examples Smoke
    • cd examples/cloud/temporal & examples/mcp_agent_server/temporal
    • Verify README instructions: uv pip install -r requirements.txt, temporal server start-dev, uv run temporal_worker.py, uv run main.py.
    • Confirm workflow_task_modules guidance matches behaviour (e.g., add a custom module and ensure it loads).
  4. Provider Error Handling
    • Make an LLM call with invalid credentials (OpenAI/Anthropic/Azure/Google) while running under Temporal; confirm the worker logs a non-retryable ApplicationError and stops retrying.

Summary by CodeRabbit

  • New Features

    • Preloadable workflow-task modules and per-activity retry policies with pattern matching.
    • Cloud Agent Factory demos and runnable worker scripts (updated entry points: main/run_worker).
    • Unified application-level error type to surface non-retryable provider errors.
  • Documentation

    • Detailed Temporal and workflow-task configuration guide with examples and provider notes.
    • New and updated example READMEs and quick-start instructions.
  • Configuration

    • Added workflow_task_modules and workflow_task_retry_policies settings.

@saqadri saqadri requested a review from rholinshead October 27, 2025 23:28
@coderabbitai
Copy link

coderabbitai bot commented Oct 27, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds declarative Temporal workflow-task configuration (module preloading and per-activity retry policies), provider-specific non-retryable error translation and a WorkflowApplicationError wrapper, integrates retry overrides into MCPApp/worker startup, and ships a Cloud Agent Factory example plus template/CLI updates and tests.

Changes

Cohort / File(s) Summary
Docs & examples (Temporal/config)
docs/advanced/temporal.mdx, docs/mcp-agent-sdk/advanced/durable-agents.mdx, examples/cloud/temporal/README.md, src/mcp_agent/data/examples/cloud/temporal/README.md, examples/cloud/agent_factory/README.md, src/mcp_agent/data/examples/cloud/agent_factory/README.md
Document new top-level workflow_task_modules and workflow_task_retry_policies, matching rules, preloading behavior, provider notes, and example YAMLs.
Configuration schema
src/mcp_agent/config.py
Add WorkflowTaskRetryPolicy model (interval coercion, validation, to_temporal_kwargs), and new workflow_task_modules & workflow_task_retry_policies fields on settings; merge logic updated to dedupe lists.
Temporal executor & preloading
src/mcp_agent/executor/temporal/__init__.py
Add _preload_workflow_task_modules() to import configured/default modules before worker assembly; convert dict retry policies to Temporal RetryPolicy with fallback logging; register global tasks after preload.
App integration (retry overrides)
src/mcp_agent/app.py
Add _get_configured_retry_policy(activity_name) to select best-match policy (exact, dotted suffix, longest prefix-wildcard, then *) and merge overrides into task metadata during registration/execution.
Workflow/task error foundation
src/mcp_agent/executor/errors.py
Add WorkflowApplicationError and helper to_application_error() plus a fallback TemporalApplicationError type to unify application-level non-retryable signaling and details propagation.
LLM provider error translation
src/mcp_agent/workflows/llm/augmented_llm_openai.py, ..._anthropic.py, ..._azure.py, ..._google.py
Introduce per-provider _NON_RETRYABLE_* sets and helpers (_execute_openai_request, _execute_anthropic_async, _raise_non_retryable_azure) to convert non-retryable provider errors to WorkflowApplicationError; add/adjust retry_policy decorators (e.g., maximum_attempts: 3).
Examples — Cloud Agent Factory
examples/cloud/agent_factory/*, src/mcp_agent/data/examples/cloud/agent_factory/* (e.g., main.py, custom_tasks.py, agents.yaml, run_worker.py, mcp_agent.config.yaml, requirements.txt, mcp_agent.secrets.yaml.example)
New cloud example showing preloaded workflow task (knowledge_base_lookup_task), usage via route_customer_request tool, Temporal worker scripts, config with workflow_task_modules and per-activity retry policy, and docs.
Examples — Basic Agent Factory updates
examples/basic/agent_factory/*, src/mcp_agent/data/templates/agent_factory.py, src/mcp_agent/data/templates/agent_factory_run_worker.py, src/mcp_agent/data/templates/agents.yaml, src/mcp_agent/data/templates/mcp_agent.config.yaml
Rename/refactor factory demo to main.py/run_worker.py, add MCPApp-backed route_prompt tool, update template agents.yaml format, and add run-worker template.
CLI scaffolding (init)
src/mcp_agent/cli/commands/init.py
Replace token scaffold with factory-runner artifacts, generate main.py, agents.yaml, and run_worker.py, update messages and next-step guidance.
Tests & packaging
tests/executor/test_errors.py, tests/workflows/test_llm_provider_errors.py, pyproject.toml
Add tests for WorkflowApplicationError and provider error translation; update package-data entries to include data/examples and data/templates.

Sequence Diagram(s)

sequenceDiagram
    participant App as MCPApp
    participant Executor as TemporalExecutor
    participant Config as Settings
    participant Worker as TemporalWorker
    participant Activity as Activity/Task
    participant Provider as LLMProvider

    App->>Executor: create_temporal_worker_for_app(app)
    Executor->>Executor: _preload_workflow_task_modules(app)
    Executor->>App: _register_global_workflow_tasks()
    App-->>Executor: tasks registered

    Executor->>Config: _get_configured_retry_policy(activity_name)
    Config-->>Executor: retry_policy dict (best match)
    Executor->>Executor: coerce dict -> RetryPolicy
    Executor->>Activity: run activity with merged retry_policy

    Activity->>Provider: call provider API
    Provider--xActivity: raise provider error
    Activity->>Activity: map to WorkflowApplicationError if non-retryable
    alt non-retryable
        Activity--xWorker: raise WorkflowApplicationError (non_retryable)
        Worker->>App: activity fails (no retry)
    else retryable
        Activity-->>Worker: error -> scheduling/retry per policy
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Areas to pay extra attention:

  • src/mcp_agent/app.py:_get_configured_retry_policy — wildcard/precedence correctness and edge cases.
  • src/mcp_agent/executor/temporal/__init__.py — import side-effects, optional extras handling, and dict→RetryPolicy coercion with robust logging.
  • Consistency across LLM providers (augmented_llm_*) for non-retryable mappings and usage of to_application_error.
  • Example/task behavior (custom_tasks.py) for scoring and input edge cases.

Possibly related PRs

Suggested reviewers

  • rholinshead

Poem

🐇 I preloaded modules at worker start, so tasks run fast and smart,
I wrapped provider errors to say which should not restart,
Retries now match the rules you write with star and dotted name,
The factory looks up knowledge, routes agents to the game,
Hop, hop — durable workflows stitched together like a chart. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.07% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Get agent factory to work with Temporal, and also allow configurable activity policies" is directly related to the main changes in the changeset. The first part accurately describes the addition of workflow task module preloading support that enables the agent factory to function within Temporal workers. The second part correctly references the new workflow_task_retry_policies configuration and WorkflowTaskRetryPolicy model that allow operators to override activity retry behavior. The title is specific and clear without using vague or generic language, making it meaningful for developers scanning project history. While the PR includes additional supporting changes (error handling refactoring, documentation, examples), the title appropriately highlights the two primary technical objectives.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/temporal_worker_module_imports

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment on lines 42 to 59
_NON_RETRYABLE_GOOGLE_ERRORS = (
google_exceptions.InvalidArgument,
google_exceptions.FailedPrecondition,
google_exceptions.PermissionDenied,
google_exceptions.NotFound,
google_exceptions.Unauthenticated,
)

if google_exceptions:
_NON_RETRYABLE_GOOGLE_ERRORS = (
google_exceptions.InvalidArgument,
google_exceptions.FailedPrecondition,
google_exceptions.PermissionDenied,
google_exceptions.NotFound,
google_exceptions.Unauthenticated,
)
else: # pragma: no cover
_NON_RETRYABLE_GOOGLE_ERRORS = tuple()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be duplicate code in the definition of _NON_RETRYABLE_GOOGLE_ERRORS. The tuple is defined twice - once unconditionally at lines 42-47, and then again inside the conditional block at lines 50-59. The first definition will always be overwritten by the conditional one, making it redundant. Consider removing the first definition and keeping only the conditional version to improve code clarity and avoid confusion.

Suggested change
_NON_RETRYABLE_GOOGLE_ERRORS = (
google_exceptions.InvalidArgument,
google_exceptions.FailedPrecondition,
google_exceptions.PermissionDenied,
google_exceptions.NotFound,
google_exceptions.Unauthenticated,
)
if google_exceptions:
_NON_RETRYABLE_GOOGLE_ERRORS = (
google_exceptions.InvalidArgument,
google_exceptions.FailedPrecondition,
google_exceptions.PermissionDenied,
google_exceptions.NotFound,
google_exceptions.Unauthenticated,
)
else: # pragma: no cover
_NON_RETRYABLE_GOOGLE_ERRORS = tuple()
if google_exceptions:
_NON_RETRYABLE_GOOGLE_ERRORS = (
google_exceptions.InvalidArgument,
google_exceptions.FailedPrecondition,
google_exceptions.PermissionDenied,
google_exceptions.NotFound,
google_exceptions.Unauthenticated,
)
else: # pragma: no cover
_NON_RETRYABLE_GOOGLE_ERRORS = tuple()

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legit

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)
examples/cloud/temporal/README.md (1)

83-85: Docs polish: link format and capitalization.

  • Wrap bare URL as a Markdown link.
  • Use “GitHub” (capital H).

Apply:

-1. Install the Temporal CLI by following the instructions at: https://docs.temporal.io/cli/
+1. Install the Temporal CLI by following the instructions at: [docs.temporal.io/cli](https://docs.temporal.io/cli/)
...
-2. You will be redirected to the login page, create an mcp-agent cloud account through Google or Github
+2. You will be redirected to the login page, create an mcp-agent cloud account through Google or GitHub

Also applies to: 148-156

src/mcp_agent/data/examples/cloud/mcp/README.md (1)

148-156: Capitalize “GitHub”.

Apply:

-2. You will be redirected to the login page, create an mcp-agent cloud account through Google or Github
+2. You will be redirected to the login page, create an mcp-agent cloud account through Google or GitHub
src/mcp_agent/workflows/llm/augmented_llm_google.py (1)

399-423: Remove the stray definition at line 42.

The _NON_RETRYABLE_GOOGLE_ERRORS tuple defined at line 42 is immediately overwritten by the conditional block (lines 50–59) and will never be used. This is dead code and should be removed, keeping only the conditional definitions at lines 51 and 59.

src/mcp_agent/workflows/llm/augmented_llm_azure.py (3)

582-596: Avoid splatting full config into ChatCompletionsClient ctor.

Passing model_dump(**) risks unexpected kwargs and TypeError. Limit to known args (e.g., api_version) and gate credential_scopes.

-        if request.config.api_key:
-            azure_client = ChatCompletionsClient(
-                endpoint=request.config.endpoint,
-                credential=AzureKeyCredential(request.config.api_key),
-                **request.config.model_dump(exclude={"endpoint", "credential"}),
-            )
-        else:
-            azure_client = ChatCompletionsClient(
-                endpoint=request.config.endpoint,
-                credential=DefaultAzureCredential(),
-                credential_scopes=request.config.credential_scopes,
-                **request.config.model_dump(
-                    exclude={"endpoint", "credential", "credential_scopes"}
-                ),
-            )
+        client_kwargs = {}
+        api_version = getattr(request.config, "api_version", None)
+        if api_version:
+            client_kwargs["api_version"] = api_version
+
+        if request.config.api_key:
+            azure_client = ChatCompletionsClient(
+                endpoint=request.config.endpoint,
+                credential=AzureKeyCredential(request.config.api_key),
+                **client_kwargs,
+            )
+        else:
+            cred_scopes = getattr(request.config, "credential_scopes", None)
+            if cred_scopes:
+                client_kwargs["credential_scopes"] = cred_scopes
+            azure_client = ChatCompletionsClient(
+                endpoint=request.config.endpoint,
+                credential=DefaultAzureCredential(),
+                **client_kwargs,
+            )

366-371: Use ChatRole enum instead of string comparison.

Comparing response.role to "tool" will fail when role is a ChatRole enum.

-            if response.role == "tool":
+            role = getattr(response, "role", None)
+            if role == ChatRole.TOOL or role == "tool":
                 # TODO: Identify tool name
                 final_text.append(f"[Tool result: {response.content}]")
             else:
                 final_text.append(response.content)

671-683: Let me retry the web search with a more specific query:

Based on the web search results, AsyncAzureOpenAI requires explicit parameter passing with api_key retrieved from environment variables like api_key=os.environ["AZURE_OPENAI_API_KEY"]. The library does not automatically fallback to environment variables when api_key is not passed.

The review comment is valid and actionable.

The current code's else block sets client_params["api_key"] to request.config.api_key (which is None in that branch), rather than either raising an error or explicitly reading from an environment variable. This creates a silent failure: None is passed to AsyncAzureOpenAI, which will likely cause runtime errors downstream instead of failing fast with a clear message.

The suggested fix to raise a ValueError with a clear error message is appropriate and aligns with the library's authentication requirements.

src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (2)

267-277: Guard against missing Anthropic config before building RequestCompletionRequest.

If config.anthropic is None, Pydantic validation will fail when constructing RequestCompletionRequest, and request.config.provider access in the task will crash. Add a friendly, non‑retryable error early.

Apply:

                 request = RequestCompletionRequest(
-                    config=config.anthropic,
+                    config=config.anthropic,
                     payload=arguments,
                 )

With this pre-check just above:

+                if not getattr(config, "anthropic", None):
+                    raise to_application_error(
+                        ValueError("Anthropic provider is not configured"),
+                        non_retryable=True,
+                        type="ConfigurationError",
+                    )

293-299: Defensive guard: usage may be absent.

Anthropic’s response.usage can be None in some cases. Prevent AttributeError by defaulting to 0.

Apply:

-                iteration_input = response.usage.input_tokens
-                iteration_output = response.usage.output_tokens
+                iteration_input = (
+                    getattr(response.usage, "input_tokens", 0) if response.usage else 0
+                )
+                iteration_output = (
+                    getattr(response.usage, "output_tokens", 0) if response.usage else 0
+                )
♻️ Duplicate comments (1)
examples/cloud/agent_factory/mcp_agent.config.yaml (1)

6-11: Verify consistency between module path and activity name.

Similar to the config in src/mcp_agent/data/examples/cloud/agent_factory/mcp_agent.config.yaml, this specifies custom_tasks as the module (Line 7) but uses cloud_agent_factory.knowledge_base_lookup_task in the retry policy (Line 10). Confirm the activity's actual module path to ensure the retry policy will match correctly.

🧹 Nitpick comments (27)
src/mcp_agent/executor/errors.py (1)

5-11: Narrow import guard to ImportError.

Catching Exception masks unrelated errors during import.

Apply:

-try:  # Temporal optional dependency
+try:  # Temporal optional dependency
     from temporalio.exceptions import ApplicationError as TemporalApplicationError
-
-    _TEMPORAL_AVAILABLE = True
-except Exception:  # pragma: no cover
+    _TEMPORAL_AVAILABLE = True
+except ImportError:  # pragma: no cover
     _TEMPORAL_AVAILABLE = False

Based on learnings.

src/mcp_agent/workflows/llm/augmented_llm_google.py (1)

10-14: Tighten the import guard to ImportError.

Avoid swallowing other exceptions during module import.

Apply:

-try:
+try:
     from google.api_core import exceptions as google_exceptions
-except Exception:  # pragma: no cover
+except ImportError:  # pragma: no cover
     google_exceptions = None
examples/cloud/temporal/README.md (1)

77-77: Add a concrete YAML snippet for workflow_task_modules.

Quick example improves setup clarity.

Add:

# mcp_agent.config.yaml
execution_engine: temporal
workflow_task_modules:
  - "your_package.custom_tasks"
workflow_task_retry_policies:
  your_package.custom_tasks.your_task:
    maximum_attempts: 3
examples/basic/agent_factory/README.md (1)

32-38: Add explicit step to start Temporal dev server.

Before running the worker, include starting Temporal to avoid connection errors.

 To exercise the same workflow via Temporal, update `mcp_agent.config.yaml` to set `execution_engine: temporal`, start the worker in another terminal, then invoke the workflow:
+# First, in a separate terminal, start Temporal locally:
+temporal server start-dev
+
 ```bash
 uv run examples/basic/agent_factory/run_worker.py
 # ...in another terminal
 uv run examples/basic/agent_factory/main.py

</blockquote></details>
<details>
<summary>src/mcp_agent/data/templates/README_server.md (2)</summary><blockquote>

`64-66`: **Fix duplicate step numbering.**

You have two sections numbered “4.”; make this one “5.”.


```diff
-4. Deploy as a remote MCP server:
+5. Deploy as a remote MCP server:

101-109: Inspector config table is helpful.

Consider adding “Bearer ” example to clarify header format.

src/mcp_agent/data/examples/cloud/agent_factory/README.md (2)

26-31: Provide fallback install command if requirements.txt is absent.

Some users may not have a requirements.txt in this copied example.

-uv pip install -r requirements.txt
+if [ -f requirements.txt ]; then
+  uv pip install -r requirements.txt
+else
+  # minimal deps for the example
+  uv init
+  uv add "mcp-agent[openai]"
+fi

57-66: Great explanation of preload + activity execution.

Consider adding a tiny config snippet to make values concrete.

# mcp_agent.config.yaml (excerpt)
execution_engine: temporal
workflow_task_modules:
  - custom_tasks
workflow_task_retry_policies:
  cloud_agent_factory.knowledge_base_lookup:
    maximum_attempts: 2
examples/cloud/agent_factory/custom_tasks.py (1)

55-57: Harden limit parsing to avoid ValueError.

Casting request["limit"] directly can throw; clamp and default safely.

-    limit = max(1, int(request.get("limit", 3)))
+    raw_limit = request.get("limit", 3)
+    try:
+        limit = int(raw_limit)
+    except (TypeError, ValueError):
+        limit = 3
+    limit = max(1, limit)
src/mcp_agent/data/templates/README_basic.md (1)

83-91: Inspector table adds clarity.

Optional: show “Authorization: Bearer ” explicitly.

src/mcp_agent/workflows/llm/augmented_llm_azure.py (2)

790-838: Return type and typing alignment.

mcp_content is typed as Iterable but used as a list. Align types; optional but improves clarity and static checking.

-def azure_content_to_mcp_content(
-    content: str | list[ContentItem] | None,
-) -> Iterable[TextContent | ImageContent | EmbeddedResource]:
-    mcp_content: Iterable[TextContent | ImageContent | EmbeddedResource] = []
+def azure_content_to_mcp_content(
+    content: str | list[ContentItem] | None,
+) -> list[TextContent | ImageContent | EmbeddedResource]:
+    mcp_content: list[TextContent | ImageContent | EmbeddedResource] = []

867-879: Harden data URL regex for broader MIME coverage.

Support image/svg+xml etc. Minor improvement.

-    match = re.match(r"data:(image/\w+);base64,(.*)", url)
+    match = re.match(r"^data:(image/[-+.\w]+);base64,([A-Za-z0-9+/=]+)$", url)
src/mcp_agent/data/examples/cloud/agent_factory/main.py (2)

8-9: Avoid hard runtime dependency on FastMCP types in examples.

Gate Context behind TYPE_CHECKING to keep the example importable without FastMCP installed.

-from mcp.server.fastmcp import Context
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from mcp.server.fastmcp import Context
@@
 async def route_customer_request(
-    prompt: str = "A customer is asking about our pricing and security posture.",
-    context_hits: int = 3,
-    app_ctx: Context | None = None,
+    prompt: str = "A customer is asking about our pricing and security posture.",
+    context_hits: int = 3,
+    app_ctx: "Context | None" = None,
 ) -> str:

Also applies to: 24-30


41-43: Optional: friendlier error if agents.yaml is missing.

Wrap load with a clear exception to aid demo users.

-    specs = load_agent_specs_from_file(str(agents_path), context=context)
+    try:
+        specs = load_agent_specs_from_file(str(agents_path), context=context)
+    except FileNotFoundError as e:
+        raise FileNotFoundError(f"Missing agents.yaml at {agents_path}") from e
tests/workflows/test_llm_provider_errors.py (1)

22-30: Good coverage of non-retryable translation.

Asserts on non_retryable and type are precise and match to_application_error behavior. Consider also asserting .workflow_details when you pass details in future cases.

src/mcp_agent/data/templates/agent_factory.py (2)

4-4: Gate Context type to avoid hard dependency in generated templates.

Keep templates importable without FastMCP; rely on postponed annotations.

-from mcp.server.fastmcp import Context
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from mcp.server.fastmcp import Context
@@
 async def route_prompt(
-    prompt: str = "Find the README and summarize it", app_ctx: Context | None = None
+    prompt: str = "Find the README and summarize it",
+    app_ctx: "Context | None" = None,
 ) -> str:

Also applies to: 15-20


25-31: Optional: make provider/server_names configurable for reuse.

Expose these as parameters with sensible defaults to make the scaffold more flexible.

-    router = await create_router_llm(
-        server_names=["filesystem", "fetch"],
-        agents=specs,
-        provider="openai",
-        context=context,
-    )
+    router = await create_router_llm(
+        server_names=kwargs.get("server_names", ["filesystem", "fetch"]),
+        agents=specs,
+        provider=kwargs.get("provider", "openai"),
+        context=context,
+    )
src/mcp_agent/data/examples/cloud/agent_factory/custom_tasks.py (2)

46-57: Prefer a typed request model; clamp limits and surface validation errors early.

Using a Pydantic model improves schema, validation, and tool UX.

-from mcp_agent.executor.workflow_task import workflow_task
+from pydantic import BaseModel, Field
+from mcp_agent.executor.workflow_task import workflow_task
@@
-@workflow_task(name="cloud_agent_factory.knowledge_base_lookup")
-async def knowledge_base_lookup_task(request: dict) -> List[str]:
+class KBLookupRequest(BaseModel):
+    query: str = ""
+    limit: int = Field(3, ge=1, le=len(_KNOWLEDGE_BASE))
+
+@workflow_task(name="cloud_agent_factory.knowledge_base_lookup")
+async def knowledge_base_lookup_task(request: KBLookupRequest) -> List[str]:
@@
-    query = str(request.get("query", "")).lower()
-    limit = max(1, int(request.get("limit", 3)))
+    query = request.query.lower()
+    limit = request.limit

58-66: Nit: avoid sorting full KB when limit is small.

Use nlargest to reduce work on bigger KBs.

-    ranked = sorted(
-        _KNOWLEDGE_BASE,
-        key=lambda entry: _score(query, entry),
-        reverse=True,
-    )
-    top_entries = ranked[:limit]
+    import heapq
+    top_entries = heapq.nlargest(limit, _KNOWLEDGE_BASE, key=lambda e: _score(query, e))
src/mcp_agent/workflows/llm/augmented_llm_openai.py (2)

957-962: Consider stable activity names to ease config-based overrides.

Explicit names make policy overrides resilient to refactors/renames.

-    @workflow_task(retry_policy={"maximum_attempts": 3})
+    @workflow_task(name="llm.openai.request_completion", retry_policy={"maximum_attempts": 3})
@@
-    @workflow_task(retry_policy={"maximum_attempts": 3})
+    @workflow_task(name="llm.openai.request_structured_completion", retry_policy={"maximum_attempts": 3})

Also applies to: 981-986


76-82: Consider adding ConflictError (409) to non-retryable tuple; generic 4xx catch-all is optional.

The OpenAI SDK exposes ConflictError as a concrete 4xx subclass, which is not currently covered. The current tuple omits it:

  • Missing: ConflictError (409)
  • Correctly omitted: RateLimitError (429) — this is transient/retryable and should not be in the non-retryable set

Your suggested generic 4xx catch-all with _is_non_retryable_status() is a reasonable enhancement, but at minimum, add ConflictError to the tuple for coverage completeness:

_NON_RETRYABLE_OPENAI_ERRORS = (
    AuthenticationError,
    PermissionDeniedError,
    BadRequestError,
    NotFoundError,
    UnprocessableEntityError,
    ConflictError,  # 409
)

The status-code-based approach is optional; either approach works depending on preference for explicitness vs. robustness.

src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (2)

798-823: Respect Anthropic base_url in Async client; keep translation.

When provider is “anthropic”, AsyncAnthropic ignores a configured base_url. Pass it to match generate_structured behavior.

Apply:

-            client = AsyncAnthropic(api_key=request.config.api_key)
+            client = AsyncAnthropic(
+                api_key=request.config.api_key,
+                base_url=getattr(request.config, "base_url", None),
+            )
             response = await _execute_anthropic_async(client, payload)

494-515: Reuse translation helper on non-streaming fallback.

On the streaming fallback path, you already fall back to create(); use the same error translation for consistency with activities.

Apply:

-                    # Fallback to create() so the call succeeds while still emitting spans.
-                    final = await client.messages.create(**args)
+                    # Fallback to create() so the call succeeds while still emitting spans.
+                    final = await _execute_anthropic_async(client, args)
src/mcp_agent/cli/commands/init.py (1)

383-391: Optional: factor chmod into a helper to DRY and handle cross‑platform quirks.

You set +x in multiple places; extract a small make_executable(Path) helper and reuse.

examples/cloud/agent_factory/main.py (1)

45-51: Make router provider configurable.

Hard-coding provider="openai" reduces portability. Consider reading from config (e.g., context.config.router_provider or default_provider) and allowing override.

src/mcp_agent/executor/temporal/__init__.py (1)

213-221: Make RetryPolicy conversion more robust.

Non-TypeError (e.g., ValueError) can arise from bad field values. Widen catch to Exception and include the exception type in the log.

Apply:

-            except TypeError as exc:
+            except Exception as exc:
                 logger.warning(
                     "Invalid retry policy configuration; falling back to default",
-                    data={"activity": activity_name, "error": str(exc)},
+                    data={
+                        "activity": activity_name,
+                        "error": str(exc),
+                        "error_type": exc.__class__.__name__,
+                    },
                 )
                 retry_policy = None
src/mcp_agent/config.py (1)

637-639: Clarify merge/precedence for workflow_task_modules (Temporal vs top-level).

If both Settings.workflow_task_modules and TemporalSettings.workflow_task_modules are set, define/implement union + de-dup semantics and document which wins on conflicts. Consider a small validator to strip empties and de-duplicate.

How are these combined at worker bootstrap time? Please confirm intended precedence (e.g., top-level union Temporal-specific, then de-dup).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7418086 and bae44bf.

📒 Files selected for processing (46)
  • docs/advanced/temporal.mdx (1 hunks)
  • docs/mcp-agent-sdk/advanced/durable-agents.mdx (1 hunks)
  • examples/basic/agent_factory/README.md (2 hunks)
  • examples/basic/agent_factory/main.py (1 hunks)
  • examples/basic/agent_factory/run_worker.py (1 hunks)
  • examples/cloud/agent_factory/README.md (1 hunks)
  • examples/cloud/agent_factory/agents.yaml (1 hunks)
  • examples/cloud/agent_factory/custom_tasks.py (1 hunks)
  • examples/cloud/agent_factory/main.py (1 hunks)
  • examples/cloud/agent_factory/mcp_agent.config.yaml (1 hunks)
  • examples/cloud/agent_factory/mcp_agent.secrets.yaml.example (1 hunks)
  • examples/cloud/agent_factory/requirements.txt (1 hunks)
  • examples/cloud/agent_factory/run_worker.py (1 hunks)
  • examples/cloud/temporal/README.md (1 hunks)
  • examples/cloud/temporal/mcp_agent.config.yaml (1 hunks)
  • examples/mcp_agent_server/temporal/README.md (1 hunks)
  • examples/mcp_agent_server/temporal/mcp_agent.config.yaml (1 hunks)
  • src/mcp_agent/app.py (3 hunks)
  • src/mcp_agent/cli/commands/init.py (5 hunks)
  • src/mcp_agent/config.py (3 hunks)
  • src/mcp_agent/data/examples/cloud/agent_factory/README.md (1 hunks)
  • src/mcp_agent/data/examples/cloud/agent_factory/agents.yaml (1 hunks)
  • src/mcp_agent/data/examples/cloud/agent_factory/custom_tasks.py (1 hunks)
  • src/mcp_agent/data/examples/cloud/agent_factory/main.py (1 hunks)
  • src/mcp_agent/data/examples/cloud/agent_factory/mcp_agent.config.yaml (1 hunks)
  • src/mcp_agent/data/examples/cloud/agent_factory/mcp_agent.secrets.yaml.example (1 hunks)
  • src/mcp_agent/data/examples/cloud/agent_factory/requirements.txt (1 hunks)
  • src/mcp_agent/data/examples/cloud/agent_factory/run_worker.py (1 hunks)
  • src/mcp_agent/data/examples/cloud/mcp/README.md (1 hunks)
  • src/mcp_agent/data/examples/cloud/temporal/README.md (1 hunks)
  • src/mcp_agent/data/examples/cloud/temporal/mcp_agent.config.yaml (1 hunks)
  • src/mcp_agent/data/templates/README_basic.md (6 hunks)
  • src/mcp_agent/data/templates/README_factory.md (1 hunks)
  • src/mcp_agent/data/templates/README_server.md (4 hunks)
  • src/mcp_agent/data/templates/agent_factory.py (1 hunks)
  • src/mcp_agent/data/templates/agent_factory_run_worker.py (1 hunks)
  • src/mcp_agent/data/templates/agents.yaml (5 hunks)
  • src/mcp_agent/data/templates/mcp_agent.config.yaml (1 hunks)
  • src/mcp_agent/executor/errors.py (1 hunks)
  • src/mcp_agent/executor/temporal/__init__.py (7 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (5 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_azure.py (5 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_google.py (4 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_openai.py (6 hunks)
  • tests/executor/test_errors.py (1 hunks)
  • tests/workflows/test_llm_provider_errors.py (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/**/*.py : Use mcp-agent's Agent abstraction for ALL LLM interactions, including quality evaluation, to ensure consistent tool access, logging, and error handling.

Applied to files:

  • src/mcp_agent/data/templates/README_basic.md
  • src/mcp_agent/data/templates/README_server.md
📚 Learning: 2025-09-05T14:31:48.139Z
Learnt from: rholinshead
PR: lastmile-ai/mcp-agent#414
File: src/mcp_agent/logging/logger.py:18-19
Timestamp: 2025-09-05T14:31:48.139Z
Learning: In the mcp-agent logging module (src/mcp_agent/logging/logger.py), temporalio should be imported lazily with try/except ImportError to avoid making it a hard dependency. Use temporalio.workflow.in_workflow() instead of isinstance checks on internal classes like _WorkflowInstanceImpl.

Applied to files:

  • src/mcp_agent/executor/errors.py
  • src/mcp_agent/executor/temporal/__init__.py
🧬 Code graph analysis (19)
src/mcp_agent/workflows/llm/augmented_llm_google.py (2)
src/mcp_agent/app.py (2)
  • executor (199-200)
  • workflow_task (1172-1254)
src/mcp_agent/executor/errors.py (1)
  • to_application_error (70-90)
src/mcp_agent/data/examples/cloud/agent_factory/run_worker.py (2)
src/mcp_agent/app.py (2)
  • executor (199-200)
  • run (526-549)
src/mcp_agent/executor/temporal/__init__.py (1)
  • create_temporal_worker_for_app (580-640)
tests/workflows/test_llm_provider_errors.py (4)
src/mcp_agent/executor/errors.py (1)
  • WorkflowApplicationError (29-67)
src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)
  • _execute_openai_request (100-106)
src/mcp_agent/workflows/llm/augmented_llm_azure.py (1)
  • _raise_non_retryable_azure (559-569)
src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (1)
  • _execute_anthropic_async (124-128)
src/mcp_agent/data/examples/cloud/agent_factory/custom_tasks.py (1)
src/mcp_agent/app.py (1)
  • workflow_task (1172-1254)
examples/cloud/agent_factory/run_worker.py (2)
src/mcp_agent/app.py (2)
  • executor (199-200)
  • run (526-549)
src/mcp_agent/executor/temporal/__init__.py (1)
  • create_temporal_worker_for_app (580-640)
src/mcp_agent/app.py (1)
src/mcp_agent/config.py (1)
  • to_temporal_kwargs (679-691)
tests/executor/test_errors.py (1)
src/mcp_agent/executor/errors.py (3)
  • WorkflowApplicationError (29-67)
  • to_application_error (70-90)
  • workflow_details (61-67)
src/mcp_agent/data/templates/agent_factory_run_worker.py (5)
src/mcp_agent/app.py (3)
  • executor (199-200)
  • logger (227-244)
  • run (526-549)
src/mcp_agent/executor/temporal/__init__.py (1)
  • create_temporal_worker_for_app (580-640)
examples/basic/agent_factory/run_worker.py (1)
  • main (15-18)
examples/cloud/agent_factory/run_worker.py (1)
  • main (14-17)
src/mcp_agent/data/examples/cloud/agent_factory/run_worker.py (1)
  • main (14-17)
examples/basic/agent_factory/main.py (3)
src/mcp_agent/core/context.py (5)
  • mcp (115-116)
  • fastmcp (119-134)
  • Context (65-323)
  • name (195-198)
  • description (201-204)
src/mcp_agent/app.py (7)
  • MCPApp (65-1331)
  • workflows (215-216)
  • async_tool (1018-1018)
  • async_tool (1021-1031)
  • async_tool (1033-1123)
  • context (183-188)
  • run (526-549)
src/mcp_agent/workflows/factory.py (2)
  • create_router_llm (130-215)
  • load_agent_specs_from_file (856-867)
src/mcp_agent/workflows/llm/augmented_llm_azure.py (2)
src/mcp_agent/app.py (3)
  • executor (199-200)
  • workflow_task (1172-1254)
  • logger (227-244)
src/mcp_agent/executor/errors.py (1)
  • to_application_error (70-90)
src/mcp_agent/data/examples/cloud/agent_factory/main.py (6)
src/mcp_agent/core/context.py (5)
  • mcp (115-116)
  • fastmcp (119-134)
  • Context (65-323)
  • name (195-198)
  • description (201-204)
src/mcp_agent/app.py (8)
  • MCPApp (65-1331)
  • workflows (215-216)
  • async_tool (1018-1018)
  • async_tool (1021-1031)
  • async_tool (1033-1123)
  • context (183-188)
  • executor (199-200)
  • run (526-549)
src/mcp_agent/workflows/factory.py (2)
  • create_router_llm (130-215)
  • load_agent_specs_from_file (856-867)
examples/cloud/agent_factory/custom_tasks.py (1)
  • knowledge_base_lookup_task (47-73)
src/mcp_agent/data/examples/cloud/agent_factory/custom_tasks.py (1)
  • knowledge_base_lookup_task (47-73)
src/mcp_agent/workflows/llm/augmented_llm_openai.py (1)
  • generate_str (424-462)
src/mcp_agent/cli/commands/init.py (1)
src/mcp_agent/cli/commands/config.py (1)
  • _load_template (36-49)
src/mcp_agent/workflows/llm/augmented_llm_openai.py (3)
src/mcp_agent/app.py (2)
  • executor (199-200)
  • workflow_task (1172-1254)
src/mcp_agent/executor/errors.py (1)
  • to_application_error (70-90)
src/mcp_agent/utils/common.py (1)
  • ensure_serializable (42-53)
examples/basic/agent_factory/run_worker.py (2)
src/mcp_agent/executor/temporal/__init__.py (1)
  • create_temporal_worker_for_app (580-640)
examples/basic/agent_factory/main.py (1)
  • main (35-40)
src/mcp_agent/data/templates/agent_factory.py (3)
src/mcp_agent/core/context.py (5)
  • mcp (115-116)
  • fastmcp (119-134)
  • Context (65-323)
  • name (195-198)
  • description (201-204)
src/mcp_agent/app.py (7)
  • MCPApp (65-1331)
  • workflows (215-216)
  • async_tool (1018-1018)
  • async_tool (1021-1031)
  • async_tool (1033-1123)
  • context (183-188)
  • run (526-549)
src/mcp_agent/workflows/factory.py (2)
  • create_router_llm (130-215)
  • load_agent_specs_from_file (856-867)
src/mcp_agent/executor/temporal/__init__.py (1)
src/mcp_agent/app.py (3)
  • logger (227-244)
  • context (183-188)
  • _register_global_workflow_tasks (1262-1331)
src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (3)
src/mcp_agent/app.py (3)
  • executor (199-200)
  • workflow_task (1172-1254)
  • config (191-192)
src/mcp_agent/executor/errors.py (1)
  • to_application_error (70-90)
src/mcp_agent/utils/common.py (1)
  • ensure_serializable (42-53)
examples/cloud/agent_factory/custom_tasks.py (1)
src/mcp_agent/app.py (1)
  • workflow_task (1172-1254)
examples/cloud/agent_factory/main.py (6)
src/mcp_agent/core/context.py (4)
  • mcp (115-116)
  • fastmcp (119-134)
  • Context (65-323)
  • name (195-198)
src/mcp_agent/server/app_server.py (2)
  • app (244-246)
  • create_mcp_server_for_app (504-2042)
src/mcp_agent/app.py (5)
  • MCPApp (65-1331)
  • workflows (215-216)
  • context (183-188)
  • executor (199-200)
  • run (526-549)
src/mcp_agent/workflows/factory.py (2)
  • create_router_llm (130-215)
  • load_agent_specs_from_file (856-867)
examples/cloud/agent_factory/custom_tasks.py (1)
  • knowledge_base_lookup_task (47-73)
src/mcp_agent/executor/temporal/__init__.py (1)
  • execute (240-256)
🪛 LanguageTool
src/mcp_agent/data/templates/README_factory.md

[uncategorized] ~48-~48: The official name of this software platform is spelled with a capital “H”.
Context: ...p-agent cloud account through Google or Github. Set up your mcp-agent cloud API Key a...

(GITHUB)

🪛 markdownlint-cli2 (0.18.1)
src/mcp_agent/data/templates/README_factory.md

83-83: Bare URL used

(MD034, no-bare-urls)

🔇 Additional comments (56)
examples/cloud/agent_factory/mcp_agent.secrets.yaml.example (1)

1-2: Approve: Placeholder template is correctly formatted.

This secrets file serves its purpose as a template for users to fill in their OpenAI API key. The structure and naming convention are appropriate.

src/mcp_agent/data/examples/cloud/agent_factory/mcp_agent.secrets.yaml.example (1)

1-2: Verify: File duplication—ensure sync between mirror locations.

This file is identical to examples/cloud/agent_factory/mcp_agent.secrets.yaml.example. If these are intended to remain in sync (e.g., for package distribution), confirm that a build process keeps them synchronized or document the intent of maintaining separate copies.

src/mcp_agent/data/examples/cloud/agent_factory/requirements.txt (1)

1-6: Verify: Relative file path resolution and directory structure.

The local path reference file://../../../ on line 2 relies on relative directory navigation. Confirm that the path resolves correctly from src/mcp_agent/data/examples/cloud/agent_factory/ to the repository root where mcp-agent source is located. If this example is copied or installed elsewhere, this relative path may break.

Consider documenting the expected directory structure or using an absolute path if the example may be distributed independently.

examples/mcp_agent_server/temporal/README.md (1)

110-115: Approve: Documentation clarification is clear and helpful.

The YAML formatting fix (lines 110–113) and the explanation of workflow_task_modules (line 115) are accurate and actionable for users setting up custom workflow tasks with Temporal.

examples/cloud/agent_factory/requirements.txt (1)

1-6: Verify: File duplication—ensure sync between mirror locations.

This file is identical to src/mcp_agent/data/examples/cloud/agent_factory/requirements.txt. Confirm whether these are intended as separate copies (e.g., for dual distribution paths) or if a build/sync process should keep them aligned. If maintained separately, document why to prevent accidental divergence.

src/mcp_agent/data/examples/cloud/temporal/mcp_agent.config.yaml (1)

6-14: Approve: Optional configuration examples are well-documented.

The commented sections for workflow_task_modules (lines 7–9) and workflow_task_retry_policies (lines 11–14) provide clear, minimal examples for users who need to preload custom task modules or override retry behavior. The formatting and structure follow YAML conventions correctly.

src/mcp_agent/data/examples/cloud/temporal/README.md (1)

77-77: Approve: Documentation on workflow module preloading is accurate.

Line 77 clearly explains how users can enable pre-import of custom @workflow_task modules by uncommenting the workflow_task_modules list in the config file. This guidance aligns with the new configuration options introduced in the PR.

examples/mcp_agent_server/temporal/mcp_agent.config.yaml (1)

7-14: Approve: Optional configuration sections mirror the pattern correctly.

The commented sections for workflow_task_modules and workflow_task_retry_policies are consistent with the new schema additions. The examples are clear and the relative schema reference on line 2 is correct for this directory location.

src/mcp_agent/data/templates/mcp_agent.config.yaml (1)

9-16: LGTM! Clear template configuration.

The commented configuration blocks provide helpful examples for users who want to preload custom workflow tasks and configure retry policies. The structure is clear and the placeholder names appropriately indicate user customization is required.

examples/cloud/temporal/mcp_agent.config.yaml (1)

7-14: LGTM! Consistent optional configuration.

The commented configuration blocks align with the template and clearly mark these features as optional. Good placement before the temporal section for logical grouping.

src/mcp_agent/data/examples/cloud/agent_factory/agents.yaml (1)

1-17: LGTM! Well-defined agent specifications.

The agent definitions are clear and well-structured. The comment explaining the relationship to inline configuration in mcp_agent.config.yaml is helpful. The distinction between the two agents (one with fetch capabilities, one without) is appropriate for demonstrating different use cases.

src/mcp_agent/data/templates/agent_factory_run_worker.py (1)

1-25: LGTM! Clean worker template.

The worker script follows the established pattern seen in other examples and provides a clear, minimal template for Temporal workers. The structure is consistent with examples/basic/agent_factory/run_worker.py and properly uses the create_temporal_worker_for_app context manager.

docs/advanced/temporal.mdx (2)

68-76: LGTM! Clear configuration example.

The YAML example clearly demonstrates the new configuration options with a concrete use case. The comment on Lines 84-89 explaining preloading behavior is helpful and sets clear expectations for users.


84-94: Comprehensive explanation of retry policies and error handling.

The documentation clearly explains:

  • When to use workflow_task_modules for preloading
  • How retry policies work with wildcards
  • Provider-specific non-retryable error types

The mention of WorkflowApplicationError as a fallback for asyncio executor is particularly useful for users who may not have the Temporal SDK installed.

examples/cloud/agent_factory/run_worker.py (1)

1-21: LGTM! Clean worker implementation.

The worker script follows the established pattern and matches the template provided in src/mcp_agent/data/templates/agent_factory_run_worker.py. The implementation is clean and consistent with other examples in the codebase.

tests/executor/test_errors.py (3)

6-10: LGTM!

The test correctly validates that WorkflowApplicationError captures type and non_retryable attributes.


13-26: LGTM!

The parametrized test appropriately validates that WorkflowApplicationError accepts additional kwargs (like details) and handles message extraction across different Exception implementations.


29-42: LGTM!

The test correctly validates that to_application_error preserves custom error attributes when converting to WorkflowApplicationError.

examples/cloud/agent_factory/agents.yaml (1)

1-18: LGTM!

The agent specifications are well-structured with clear instructions and appropriate server configurations. The comment helpfully documents alternative configuration approaches.

src/mcp_agent/data/examples/cloud/agent_factory/run_worker.py (2)

8-8: Relative import is consistent with example patterns.

The relative import from main import app follows the pattern used in other examples (e.g., examples/basic/agent_factory/run_worker.py). This is appropriate for example code where files are co-located.


14-17: LGTM!

The worker initialization and execution logic correctly uses the async context manager pattern and runs the worker in the main loop.

examples/basic/agent_factory/run_worker.py (1)

1-22: LGTM!

The worker script follows the same pattern as other examples in the PR, maintaining consistency across the codebase.

docs/mcp-agent-sdk/advanced/durable-agents.mdx (1)

153-192: LGTM!

The documentation is comprehensive and well-structured, providing clear examples of workflow task module preloading and retry policy configuration. The provider-specific error type guidance is especially valuable.

src/mcp_agent/app.py (3)

1125-1170: LGTM!

The retry policy matching logic is well-implemented with appropriate priority levels (exact > suffix > prefix > global) and length-based tiebreaking. The suffix matching at line 1163 correctly uses f".{key}" to prevent partial matches.


1212-1215: LGTM!

The configured retry policy is correctly applied by merging with existing policy, ensuring overrides take precedence while preserving any explicitly set values.


1291-1297: LGTM!

The retry policy override is consistently applied for global workflow tasks, with proper metadata updates to ensure the function carries the final policy configuration.

examples/basic/agent_factory/main.py (2)

22-32: LGTM!

The agent spec loading and router creation logic is correct, with proper path resolution using Path(__file__).resolve().parent to locate the agents.yaml file.


35-40: LGTM!

The main function correctly initializes the app and executes the route_prompt workflow with proper context passing.

examples/cloud/agent_factory/README.md (1)

1-69: LGTM!

The README provides comprehensive documentation with clear quick-start instructions and a helpful explanation of the workflow task preloading and execution flow.

examples/basic/agent_factory/README.md (1)

3-12: Docs update reads well.

References to mcp_agent.workflows.factory, main.py, and run_worker.py look correct.

src/mcp_agent/data/templates/README_server.md (1)

33-39: Install flow simplification is good.

uv init + uv add "mcp-agent[openai]" + uv run main.py is clear.

examples/cloud/agent_factory/custom_tasks.py (2)

10-43: LGTM: embedded KB content and structure.

Clear, deterministic snippets suitable for enrichment.


46-48: Decorator import verified; no issues found.

The import from mcp_agent.executor.workflow_task import workflow_task at line 7 is correct and resolves to the intended public decorator. The decorator definition exists at src/mcp_agent/executor/workflow_task.py and is properly exported. Multiple files across the codebase use the same import successfully, confirming this is the standard pattern. No import-time failures will occur.

src/mcp_agent/data/templates/README_basic.md (2)

17-26: Quick start is concise and accurate.


68-69: Deploy command update is appropriate.

src/mcp_agent/workflows/llm/augmented_llm_azure.py (4)

559-570: Non‑retryable error shim is correct and aligns with to_application_error.


33-40: Imports and error mapping look correct.

Good consolidation of Azure OpenAI exception classes and non‑retryable codes.

Also applies to: 75-85


574-578: Per‑task retry policy is applied consistently.

@workflow_task(retry_policy={"maximum_attempts": 3}) on both tasks aligns with config overrides.

Also applies to: 633-637


54-55: No issues found with the import path.

The workflow_task decorator imported from mcp_agent.executor.workflow_task is a documented, static alternative to MCPApp.workflow_task. The module explicitly states "Wherever possible it is preferred to use @app.workflow_task in MCPApp", but the static decorator is a supported pattern for cases without direct app access. Both register to the same GlobalWorkflowTaskRegistry, and there is no duplicate registration risk from the import pattern itself. The code is correctly using a documented alternative approach.

tests/workflows/test_llm_provider_errors.py (2)

32-50: Rate limit propagation test is spot on.

Keeps rate limits retryable by not wrapping in WorkflowApplicationError. LGTM.


52-62: Azure path assertion is clear.

Message includes status code and marks non-retryable. LGTM.

src/mcp_agent/workflows/llm/augmented_llm_openai.py (2)

100-107: Wrapper centralizes translation and preserves retryability semantics. LGTM.

Non‑retryables become WorkflowApplicationError; rate limits and other transient errors propagate. Good separation.


976-977: Good: both task paths use the shared executor wrapper.

Ensures consistent error translation and simplifies testing.

Also applies to: 1028-1028

src/mcp_agent/workflows/llm/augmented_llm_anthropic.py (3)

75-81: Good mapping of non-retryable Anthropic errors.

This aligns with surfacing provider misconfig as non-retryable ApplicationErrors. Looks correct.


124-129: Async wrapper correctly translates Anthropic errors to WorkflowApplicationError.

Centralizing error translation here is solid and keeps activity code clean.


156-167: Model identifiers verified as current and correct.

The three model identifiers in the code match current Anthropic documentation:

  • Standard API: claude-sonnet-4-20250514
  • Bedrock: anthropic.claude-sonnet-4-20250514-v1:0
  • Vertex AI: claude-sonnet-4@20250514

The code is also well-structured to handle updates—configuration-driven provider selection and support for config.anthropic.default_model override provides a maintenance path as these identifiers evolve.

src/mcp_agent/cli/commands/init.py (1)

421-427: Nice UX: clear next steps for Temporal.

Instructions are concise and correct; good addition.

src/mcp_agent/data/templates/agents.yaml (2)

5-15: LGTM: list-of-agent specs with in‑block names.

Structure matches factory loader expectations and keeps YAML simple.


53-61: General assistant with empty server_names is fine.

Router can still route prompts even without tool servers.

examples/cloud/agent_factory/main.py (2)

34-41: Good: execute KB task via executor with error propagation.

Pattern is correct and works under both asyncio and Temporal engines.


67-71: SSE entrypoint looks good.

Cleanly runs the app via create_mcp_server_for_app and SSE loop.

src/mcp_agent/executor/temporal/__init__.py (4)

52-59: Sensible default module preload set.

Covers major LLM providers to ensure tasks are visible to the worker.


61-69: Helpful extras mapping and warning.

Great UX for missing optional dependencies; message is actionable.


525-577: Preloading workflow-task modules before building the worker is the right call.

Covers both global config and temporal‑scoped config; best‑effort and safe.


591-619: Registering global tasks post-preload ensures activities are discoverable.

Solid integration point with MCPApp._register_global_workflow_tasks (applies retry overrides).

src/mcp_agent/config.py (1)

11-11: Import timedelta — OK

Needed for the new retry-policy coercion; no issues.

Comment on lines +643 to +692
class WorkflowTaskRetryPolicy(BaseModel):
"""
Declarative retry policy for workflow tasks / activities (mirrors Temporal RetryPolicy fields).
Durations can be specified either as seconds (number) or ISO8601 timedelta strings; both are
coerced to datetime.timedelta instances.
"""

maximum_attempts: int | None = None
initial_interval: timedelta | float | str | None = None
backoff_coefficient: float | None = None
maximum_interval: timedelta | float | str | None = None
non_retryable_error_types: List[str] | None = None

model_config = ConfigDict(extra="forbid")

@field_validator("initial_interval", "maximum_interval", mode="before")
@classmethod
def _coerce_interval(cls, value):
if value is None:
return None
if isinstance(value, timedelta):
return value
if isinstance(value, (int, float)):
return timedelta(seconds=value)
if isinstance(value, str):
try:
seconds = float(value)
return timedelta(seconds=seconds)
except Exception:
raise TypeError(
"Retry interval strings must be parseable as seconds."
) from None
raise TypeError(
"Retry interval must be seconds (number or string) or a timedelta."
)

def to_temporal_kwargs(self) -> Dict[str, Any]:
data: Dict[str, Any] = {}
if self.maximum_attempts is not None:
data["maximum_attempts"] = self.maximum_attempts
if self.initial_interval is not None:
data["initial_interval"] = self.initial_interval
if self.backoff_coefficient is not None:
data["backoff_coefficient"] = self.backoff_coefficient
if self.maximum_interval is not None:
data["maximum_interval"] = self.maximum_interval
if self.non_retryable_error_types:
data["non_retryable_error_types"] = list(self.non_retryable_error_types)
return data

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Retry policy: doc/behavior mismatch and missing value-range checks.

  • Docstring says ISO8601 strings, but validator only accepts numeric strings; either support ISO8601 or update the doc.
  • Add basic validation: maximum_attempts >= 0, intervals > 0, and maximum_interval >= initial_interval if both set; backoff_coefficient > 0.

Apply this minimal doc/validation update:

 class WorkflowTaskRetryPolicy(BaseModel):
-    """
-    Declarative retry policy for workflow tasks / activities (mirrors Temporal RetryPolicy fields).
-    Durations can be specified either as seconds (number) or ISO8601 timedelta strings; both are
-    coerced to datetime.timedelta instances.
-    """
+    """
+    Declarative retry policy for workflow tasks/activities (mirrors Temporal RetryPolicy).
+    Durations may be provided as timedelta objects, numbers (seconds), or strings parseable as seconds
+    (e.g., "0.5"). If ISO8601 strings are desired, extend the parser in _coerce_interval accordingly.
+    """

-    maximum_attempts: int | None = None
+    maximum_attempts: int | None = Field(default=None, ge=0)
     initial_interval: timedelta | float | str | None = None
-    backoff_coefficient: float | None = None
+    backoff_coefficient: float | None = Field(default=None, gt=0)
     maximum_interval: timedelta | float | str | None = None
     non_retryable_error_types: List[str] | None = None
@@
     def to_temporal_kwargs(self) -> Dict[str, Any]:
         data: Dict[str, Any] = {}
@@
         if self.non_retryable_error_types:
-            data["non_retryable_error_types"] = list(self.non_retryable_error_types)
+            # de-dup while preserving order
+            seen = set()
+            data["non_retryable_error_types"] = [
+                e for e in self.non_retryable_error_types if not (e in seen or seen.add(e))
+            ]
         return data
+
+    @model_validator(mode="after")
+    def _validate_intervals(self) -> "WorkflowTaskRetryPolicy":
+        if self.initial_interval is not None and self.initial_interval <= timedelta(0):
+            raise ValueError("initial_interval must be > 0")
+        if self.maximum_interval is not None and self.maximum_interval <= timedelta(0):
+            raise ValueError("maximum_interval must be > 0")
+        if (
+            self.initial_interval is not None
+            and self.maximum_interval is not None
+            and self.maximum_interval < self.initial_interval
+        ):
+            raise ValueError("maximum_interval must be >= initial_interval")
+        return self

Optionally, add ISO8601 parsing (e.g., via isodate) in _coerce_interval if you want to honor the original doc.

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +1134 to +1141
workflow_task_modules: List[str] = Field(default_factory=list)
"""Optional list of modules to import at startup so workflow tasks register globally."""

workflow_task_retry_policies: Dict[str, WorkflowTaskRetryPolicy] = Field(
default_factory=dict
)
"""Optional mapping of activity names (supports '*' and 'prefix*') to retry policies."""

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Top-level workflow_task_*: document match precedence; consider list merge in deep_merge.

  • Please document and enforce precedence for policy lookup: exact match > longest prefix* > "*".
  • Consider concatenating and de-duplicating workflow_task_modules when merging config + secrets, similar to otel.exporters, so secrets don’t overwrite modules defined in config.

To verify the matcher behavior, please run:


🏁 Script executed:

#!/bin/bash
# Inspect the policy selection logic to confirm precedence rules
rg -nP -C3 '_get_configured_retry_policy\s*\(' src | sed -n '1,200p'

Length of output: 1466


I need to see the complete implementation of _get_configured_retry_policy. Let me inspect it fully along with the merge logic for modules.


---

🏁 Script executed:

```shell
#!/bin/bash
# Get full implementation of _get_configured_retry_policy
rg -nP -A20 'def _get_configured_retry_policy\(' src/mcp_agent/app.py

Length of output: 952


🏁 Script executed:

#!/bin/bash
# Search for deep_merge or merge functions handling workflow_task modules
rg -nP -C5 'workflow_task_modules' src | head -100

Length of output: 7841


🏁 Script executed:

#!/bin/bash
# Search for otel.exporters pattern mentioned in review
rg -nP -C3 'otel.*exporter' src | head -80

Length of output: 3360


I need to see the complete retry policy matching logic and the full merge function context.


---

🏁 Script executed:

```shell
#!/bin/bash
# Get complete _get_configured_retry_policy implementation including priority/length logic
rg -nP -A40 'def _get_configured_retry_policy\(' src/mcp_agent/app.py

Length of output: 1708


🏁 Script executed:

#!/bin/bash
# Get the merge function and context around line 1320 to understand deep_merge
rg -nP -B20 'current_path == \("otel", "exporters"\)' src/mcp_agent/config.py | head -80

Length of output: 997


🏁 Script executed:

#!/bin/bash
# Find where deep_merge is called for top-level config merging
rg -nP 'def.*deep_merge|deep_merge\(' src/mcp_agent/config.py | head -20

Length of output: 318


I need to check if both workflow_task_modules fields need the merge fix or just the one at line 1134 in MCPAgentSettings.


---

🏁 Script executed:

```shell
#!/bin/bash
# Check if WorkerSettings is also part of config merging or if only MCPAgentSettings is affected
rg -nP 'class WorkerSettings|class MCPAgentSettings' src/mcp_agent/config.py -A1

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Find where deep_merge is called to understand the context better
rg -nP -B5 -A5 'deep_merge\(merged_settings' src/mcp_agent/config.py

Length of output: 1161


🏁 Script executed:

#!/bin/bash
# Check the structure - is MCPAgentSettings the top-level config?
rg -nP 'MCPAgentSettings' src/mcp_agent/config.py | head -30

Length of output: 0



---

🏁 Script executed:

```shell
#!/bin/bash
# Get context around line 1134 to see what class it belongs to
sed -n '1100,1150p' src/mcp_agent/config.py | head -70

Length of output: 2104


🏁 Script executed:

#!/bin/bash
# Find the class definition that contains line 1134
rg -nP '^class ' src/mcp_agent/config.py | grep -E '1[01][0-9]{2}:'

Length of output: 111


🏁 Script executed:

#!/bin/bash
# Get all class definitions to understand hierarchy
rg -nP '^class ' src/mcp_agent/config.py

Length of output: 1341


Document retry policy precedence rule and add workflow_task_modules list concatenation in config merge.

Two issues require fixes:

  1. _get_configured_retry_policy method (app.py:1125): The implementation correctly enforces precedence rules via tuple comparison (exact match > longest prefix* > wildcard), but lacks a docstring. Add documentation explaining that policies are matched in order of priority: exact name (priority 3) > longest prefix* match (priority 1) > "*" wildcard (priority 0).

  2. deep_merge function (config.py:1303): Currently special-cases only ("otel", "exporters") for list concatenation. Add similar handling for ("workflow_task_modules",) and ("temporal", "workflow_task_modules") to concatenate and deduplicate modules from config and secrets files instead of letting secrets overwrite config values.

🤖 Prompt for AI Agents
In src/mcp_agent/config.py around lines 1134-1141 and the deep_merge
implementation near line 1303, and in src/mcp_agent/app.py around line 1125, add
a docstring to _get_configured_retry_policy in app.py describing the precedence:
exact activity name match has highest priority (priority 3), longest prefix*
match next (priority 1), and "*" wildcard lowest (priority 0); and update
deep_merge in config.py to special-case keys ("workflow_task_modules",) and
("temporal","workflow_task_modules") so that when merging config and secrets you
concatenate lists and deduplicate module names (preserve order from config
first, then append non-duplicates from secrets) instead of letting secrets
overwrite config.

Comment on lines +48 to +55
to authenticate to mcp-agent cloud. You will be redirected to the login page to create an mcp-agent cloud account through Google or Github.

Set up your mcp-agent cloud API Key and copy & paste it into your terminal

```bash
INFO: Directing to MCP Agent Cloud API login...
Please enter your API key 🔑:
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Capitalize “GitHub”.

Apply:

-You will be redirected to the login page to create an mcp-agent cloud account through Google or Github.
+You will be redirected to the login page to create an mcp-agent cloud account through Google or GitHub.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
to authenticate to mcp-agent cloud. You will be redirected to the login page to create an mcp-agent cloud account through Google or Github.
Set up your mcp-agent cloud API Key and copy & paste it into your terminal
```bash
INFO: Directing to MCP Agent Cloud API login...
Please enter your API key 🔑:
```
to authenticate to mcp-agent cloud. You will be redirected to the login page to create an mcp-agent cloud account through Google or GitHub.
Set up your mcp-agent cloud API Key and copy & paste it into your terminal
🧰 Tools
🪛 LanguageTool

[uncategorized] ~48-~48: The official name of this software platform is spelled with a capital “H”.
Context: ...p-agent cloud account through Google or Github. Set up your mcp-agent cloud API Key a...

(GITHUB)

🤖 Prompt for AI Agents
In src/mcp_agent/data/templates/README_factory.md around lines 48 to 55, the
vendor name "Github" is not capitalized correctly; change "Github" to "GitHub"
in the sentence "create an mcp-agent cloud account through Google or Github." so
it reads "Google or GitHub." and save the file.

Comment on lines +60 to +68
@property
def workflow_details(self):
details = getattr(self, "details", None)
if details:
if isinstance(details, tuple):
return list(details)
return details
return self._workflow_details_fallback

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t drop empty details; check for None instead of truthiness.

Apply:

-    def workflow_details(self):
-        details = getattr(self, "details", None)
-        if details:
+    def workflow_details(self):
+        details = getattr(self, "details", None)
+        if details is not None:
             if isinstance(details, tuple):
                 return list(details)
             return details
         return self._workflow_details_fallback
🤖 Prompt for AI Agents
In src/mcp_agent/executor/errors.py around lines 60 to 68, the current property
uses a truthiness check ("if details:") which drops empty but valid details
(e.g., empty dict/list/tuple); change the check to "if details is not None:" so
None triggers the fallback but empty containers are returned (still convert
tuple to list as before); update that single condition and keep the rest of the
logic unchanged.

Comment on lines 42 to 48
_NON_RETRYABLE_GOOGLE_ERRORS = (
google_exceptions.InvalidArgument,
google_exceptions.FailedPrecondition,
google_exceptions.PermissionDenied,
google_exceptions.NotFound,
google_exceptions.Unauthenticated,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Crash on import when google_exceptions is unavailable (unguarded attribute access).

The first assignment dereferences None; keep only the guarded definition below.

Apply:

-_NON_RETRYABLE_GOOGLE_ERRORS = (
-    google_exceptions.InvalidArgument,
-    google_exceptions.FailedPrecondition,
-    google_exceptions.PermissionDenied,
-    google_exceptions.NotFound,
-    google_exceptions.Unauthenticated,
-)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_NON_RETRYABLE_GOOGLE_ERRORS = (
google_exceptions.InvalidArgument,
google_exceptions.FailedPrecondition,
google_exceptions.PermissionDenied,
google_exceptions.NotFound,
google_exceptions.Unauthenticated,
)
🤖 Prompt for AI Agents
In src/mcp_agent/workflows/llm/augmented_llm_google.py around lines 42 to 48,
the top-level assignment to _NON_RETRYABLE_GOOGLE_ERRORS dereferences
google_exceptions (which can be None) causing an import-time crash; remove this
unguarded assignment and rely only on the guarded definition later in the file
(or replace with a safe conditional/try-except that sets the tuple only if
google_exceptions is available).

- `load_and_route.py`: load agents and route via an LLM
- `main.py`: load agents, register the `route_prompt` tool, and route requests
- `run_worker.py`: Temporal worker (set `execution_engine: temporal` and run this in another terminal)
- `load_and_route.py`: legacy script version of the routing demo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to maintain legacy support for examples?

"""Route customer-facing questions and seed the LLM with KB context."""
context = app_ctx or app.context

kb_snippets = await context.executor.execute(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to wrap my head around the task modules change since the previous issue was that the activities for the agents created from the factory were not registered; this kb_snippet part here is an unrelated delta just to show the ability to execute them directly? i.e. if the kb_snippets part here were removed (as well as the import from custom_tasks in this file), the agents from the router call would still work now right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is unrelated to that. I wanted to show executing a workflow task


logger = get_logger(__name__)

DEFAULT_TEMPORAL_WORKFLOW_TASK_MODULES: tuple[str, ...] = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think this answers my above question -- this is what would make the old example work as-is

Comment on lines 42 to 59
_NON_RETRYABLE_GOOGLE_ERRORS = (
google_exceptions.InvalidArgument,
google_exceptions.FailedPrecondition,
google_exceptions.PermissionDenied,
google_exceptions.NotFound,
google_exceptions.Unauthenticated,
)

if google_exceptions:
_NON_RETRYABLE_GOOGLE_ERRORS = (
google_exceptions.InvalidArgument,
google_exceptions.FailedPrecondition,
google_exceptions.PermissionDenied,
google_exceptions.NotFound,
google_exceptions.Unauthenticated,
)
else: # pragma: no cover
_NON_RETRYABLE_GOOGLE_ERRORS = tuple()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Legit

@rholinshead rholinshead mentioned this pull request Oct 28, 2025
)


@workflow_task(name="cloud_agent_factory.knowledge_base_lookup")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@workflow_task(name="cloud_agent_factory.knowledge_base_lookup")
@workflow_task(name="cloud_agent_factory.knowledge_base_lookup_task")

Should this add the _task suffix to match the locations referencing it?

Copy link
Member

@rholinshead rholinshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, assuming testing is successful. PTAL at the graphite-app comments since some of them seem reasonable

Comment on lines +621 to +622
except Exception as retry_error:
_raise_non_retryable_azure(retry_error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broad Exception handler incorrectly marks all retry failures as non-retryable. This will prevent legitimate retries for transient errors like network timeouts or rate limits during the fallback attempt.

Impact: Temporary failures (timeouts, rate limits, etc.) during the 400-status fallback retry will be marked as non-retryable, causing workflows to fail permanently instead of retrying.

Fix:

except HttpResponseError as retry_error:
    if retry_error.status_code in _NON_RETRYABLE_AZURE_STATUS_CODES:
        _raise_non_retryable_azure(retry_error, retry_error.status_code)
    raise
except _NON_RETRYABLE_AZURE_OPENAI_ERRORS as retry_error:
    _raise_non_retryable_azure(retry_error)

This allows retryable errors to propagate normally while only marking truly non-retryable errors.

Suggested change
except Exception as retry_error:
_raise_non_retryable_azure(retry_error)
except HttpResponseError as retry_error:
if retry_error.status_code in _NON_RETRYABLE_AZURE_STATUS_CODES:
_raise_non_retryable_azure(retry_error, retry_error.status_code)
raise
except _NON_RETRYABLE_AZURE_OPENAI_ERRORS as retry_error:
_raise_non_retryable_azure(retry_error)

Spotted by Graphite Agent

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
src/mcp_agent/config.py (2)

643-692: Retry policy: doc/behavior mismatch and missing value-range checks

The issues flagged in the previous review remain unaddressed:

  • Docstring claims ISO8601 duration support, but _coerce_interval only accepts numeric strings (seconds)
  • Missing validation: maximum_attempts >= 0, intervals > 0, backoff_coefficient > 0
  • Missing relationship check: maximum_interval >= initial_interval
  • non_retryable_error_types should be deduplicated in to_temporal_kwargs()

Please apply the fixes suggested in the previous review or update the docstring to match the actual behavior.


1134-1140: Document retry policy precedence rule

The workflow_task_retry_policies field lacks documentation on match precedence. Based on the previous review comments, the matching logic in app.py uses: exact activity name (highest priority) > longest prefix* match > "*" wildcard (lowest priority). This should be documented in the docstring here.

Note: The list concatenation issue for workflow_task_modules appears to be addressed in the deep_merge function at lines 1323-1335.

-    workflow_task_retry_policies: Dict[str, WorkflowTaskRetryPolicy] = Field(
-        default_factory=dict
-    )
-    """Optional mapping of activity names (supports '*' and 'prefix*') to retry policies."""
+    workflow_task_retry_policies: Dict[str, WorkflowTaskRetryPolicy] = Field(
+        default_factory=dict
+    )
+    """Optional mapping of activity names (supports '*' and 'prefix*') to retry policies.
+    
+    Match precedence: exact name (highest) > longest prefix* > '*' wildcard (lowest)."""
🧹 Nitpick comments (2)
src/mcp_agent/data/templates/agent_factory.py (1)

15-34: Consider documenting context resolution pattern

The context = app_ctx or app.context pattern (line 20) is valid but could benefit from a brief comment explaining when each path is used:

  • When called from Temporal activities: app.context is available (app initialized by worker)
  • When testing or calling programmatically: pass explicit app_ctx

This helps users understand the dual-usage pattern when adapting the template.

Otherwise, the function implementation is clean and correct.

src/mcp_agent/data/examples/cloud/agent_factory/main.py (1)

27-61: LGTM: route_customer_request implementation

The function correctly:

  • Retrieves knowledge base context via executor.execute
  • Handles errors from the executor (lines 40-41)
  • Falls back gracefully when no KB hits are found (line 43)
  • Loads agent specs and creates an LLM router
  • Enriches the prompt with KB context before routing

For production use, consider making the enriched prompt template (lines 54-59) configurable through settings or parameters, allowing different system instructions without code changes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bae44bf and 14b5a9d.

📒 Files selected for processing (13)
  • docs/mcp-agent-sdk/advanced/durable-agents.mdx (1 hunks)
  • examples/basic/agent_factory/README.md (2 hunks)
  • examples/basic/agent_factory/main.py (1 hunks)
  • examples/cloud/agent_factory/main.py (1 hunks)
  • examples/cloud/agent_factory/mcp_agent.config.yaml (1 hunks)
  • pyproject.toml (1 hunks)
  • src/mcp_agent/app.py (3 hunks)
  • src/mcp_agent/config.py (4 hunks)
  • src/mcp_agent/data/examples/cloud/agent_factory/main.py (1 hunks)
  • src/mcp_agent/data/examples/cloud/agent_factory/mcp_agent.config.yaml (1 hunks)
  • src/mcp_agent/data/templates/agent_factory.py (1 hunks)
  • src/mcp_agent/executor/errors.py (1 hunks)
  • src/mcp_agent/workflows/llm/augmented_llm_google.py (4 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/mcp_agent/data/examples/cloud/agent_factory/mcp_agent.config.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • docs/mcp-agent-sdk/advanced/durable-agents.mdx
  • examples/cloud/agent_factory/mcp_agent.config.yaml
  • examples/basic/agent_factory/main.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: rholinshead
PR: lastmile-ai/mcp-agent#414
File: src/mcp_agent/logging/logger.py:18-19
Timestamp: 2025-09-05T14:31:48.139Z
Learning: In the mcp-agent logging module (src/mcp_agent/logging/logger.py), temporalio should be imported lazily with try/except ImportError to avoid making it a hard dependency. Use temporalio.workflow.in_workflow() instead of isinstance checks on internal classes like _WorkflowInstanceImpl.
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/**/*.py : Use mcp-agent's Agent abstraction for ALL LLM interactions, including quality evaluation, to ensure consistent tool access, logging, and error handling.

Applied to files:

  • src/mcp_agent/app.py
  • src/mcp_agent/data/templates/agent_factory.py
  • examples/cloud/agent_factory/main.py
  • src/mcp_agent/data/examples/cloud/agent_factory/main.py
  • examples/basic/agent_factory/README.md
  • src/mcp_agent/workflows/llm/augmented_llm_google.py
  • pyproject.toml
📚 Learning: 2025-09-05T14:31:48.139Z
Learnt from: rholinshead
PR: lastmile-ai/mcp-agent#414
File: src/mcp_agent/logging/logger.py:18-19
Timestamp: 2025-09-05T14:31:48.139Z
Learning: In the mcp-agent logging module (src/mcp_agent/logging/logger.py), temporalio should be imported lazily with try/except ImportError to avoid making it a hard dependency. Use temporalio.workflow.in_workflow() instead of isinstance checks on internal classes like _WorkflowInstanceImpl.

Applied to files:

  • src/mcp_agent/app.py
  • examples/cloud/agent_factory/main.py
  • src/mcp_agent/data/examples/cloud/agent_factory/main.py
  • src/mcp_agent/config.py
  • src/mcp_agent/executor/errors.py
📚 Learning: 2025-07-22T18:59:49.368Z
Learnt from: CR
PR: lastmile-ai/mcp-agent#0
File: examples/usecases/reliable_conversation/CLAUDE.md:0-0
Timestamp: 2025-07-22T18:59:49.368Z
Learning: Applies to examples/usecases/reliable_conversation/examples/reliable_conversation/src/utils/config.py : Configuration values such as quality_threshold, max_refinement_attempts, consolidation_interval, and evaluator_model_provider must be loaded from mcp_agent.config.yaml.

Applied to files:

  • src/mcp_agent/app.py
  • examples/basic/agent_factory/README.md
🧬 Code graph analysis (5)
src/mcp_agent/app.py (1)
src/mcp_agent/config.py (1)
  • to_temporal_kwargs (679-691)
src/mcp_agent/data/templates/agent_factory.py (3)
src/mcp_agent/app.py (7)
  • context (183-188)
  • MCPApp (65-1340)
  • workflows (215-216)
  • async_tool (1018-1018)
  • async_tool (1021-1031)
  • async_tool (1033-1123)
  • run (526-549)
src/mcp_agent/core/context.py (3)
  • Context (65-323)
  • name (195-198)
  • description (201-204)
src/mcp_agent/workflows/factory.py (2)
  • create_router_llm (130-215)
  • load_agent_specs_from_file (856-867)
examples/cloud/agent_factory/main.py (6)
src/mcp_agent/app.py (8)
  • context (183-188)
  • MCPApp (65-1340)
  • workflows (215-216)
  • async_tool (1018-1018)
  • async_tool (1021-1031)
  • async_tool (1033-1123)
  • executor (199-200)
  • run (526-549)
src/mcp_agent/core/context.py (3)
  • Context (65-323)
  • name (195-198)
  • description (201-204)
src/mcp_agent/server/app_server.py (2)
  • app (244-246)
  • create_mcp_server_for_app (504-2042)
src/mcp_agent/workflows/factory.py (2)
  • create_router_llm (130-215)
  • load_agent_specs_from_file (856-867)
examples/cloud/agent_factory/custom_tasks.py (1)
  • knowledge_base_lookup_task (47-73)
src/mcp_agent/executor/temporal/__init__.py (1)
  • execute (240-256)
src/mcp_agent/data/examples/cloud/agent_factory/main.py (6)
src/mcp_agent/app.py (8)
  • context (183-188)
  • MCPApp (65-1340)
  • workflows (215-216)
  • async_tool (1018-1018)
  • async_tool (1021-1031)
  • async_tool (1033-1123)
  • executor (199-200)
  • run (526-549)
src/mcp_agent/core/context.py (3)
  • Context (65-323)
  • name (195-198)
  • description (201-204)
src/mcp_agent/workflows/factory.py (2)
  • create_router_llm (130-215)
  • load_agent_specs_from_file (856-867)
examples/cloud/agent_factory/custom_tasks.py (1)
  • knowledge_base_lookup_task (47-73)
src/mcp_agent/data/examples/cloud/agent_factory/custom_tasks.py (1)
  • knowledge_base_lookup_task (47-73)
src/mcp_agent/executor/temporal/__init__.py (1)
  • execute (240-256)
src/mcp_agent/workflows/llm/augmented_llm_google.py (2)
src/mcp_agent/app.py (2)
  • executor (199-200)
  • workflow_task (1181-1263)
src/mcp_agent/executor/errors.py (1)
  • to_application_error (83-103)
🔇 Additional comments (8)
src/mcp_agent/config.py (2)

11-11: LGTM: timedelta import

The import is correctly placed and necessary for the new WorkflowTaskRetryPolicy interval fields.


637-638: LGTM: workflow_task_modules field

The field is correctly defined and documented. This enables preloading workflow task modules before Temporal worker startup, addressing the core PR objective.

src/mcp_agent/data/templates/agent_factory.py (3)

4-4: LGTM: Context import

The import is necessary for the route_prompt function signature and correctly placed.


12-13: LGTM: module-level app instance

Creating the app at module level is the correct pattern, enabling both direct execution and import for Temporal worker registration (addressing the PR's core objective).


36-46: LGTM: clean main() implementation

The main function demonstrates the correct usage pattern:

  • Initializes app with async with app.run()
  • Passes explicit context to route_prompt
  • Clean execution via asyncio.run()

This is good template code.

src/mcp_agent/data/examples/cloud/agent_factory/main.py (3)

16-19: LGTM: robust import pattern

The try/except import pattern addresses the previous review concern about relative imports while also supporting direct script execution. This is better than a relative-only import:

  • Relative import works when imported as a package module
  • Absolute fallback works when run as python main.py

The # pragma: no cover comment is appropriate.


21-24: LGTM: app instance creation

Module-level app with descriptive name and description. This enables Temporal worker import as documented in the PR objectives.


63-70: LGTM: clean main() implementation

The main function follows the same clean pattern as the template:

  • Proper initialization with async with app.run()
  • Explicit context passing
  • Clean execution via asyncio.run()

Comment on lines +113 to 122
mcp_agent = [
"data/*.json",
"data/templates/**/*",
"data/examples/**/*",
"resources/examples/**/*.py",
"resources/examples/**/*.yaml",
"resources/examples/**/*.yml",
"resources/examples/**/*.csv",
"resources/examples/**/mount-point/*.csv"
"resources/examples/**/mount-point/*.csv",
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

✓ Correct package-data key name and new data includes.

The change from mcp-agent to mcp_agent in line 113 is correct; setuptools package-data keys should match the actual Python package import name (with underscores, not hyphens). The new glob patterns for templates and examples align well with the PR objectives to ship factory templates and a Cloud Agent Factory example.

Please verify that the new directories actually exist in this PR:

  • mcp_agent/data/templates/ and its contents
  • mcp_agent/data/examples/ and its contents

You can confirm these are included by checking the directory structure:


🏁 Script executed:

#!/bin/bash
# Verify that the new template and example directories exist
find . -type d -name "templates" -o -name "examples" | grep -E "mcp_agent/(data|resources)/(templates|examples)" | head -20

Length of output: 60


🏁 Script executed:

#!/bin/bash
# Check for files in the template and example directories
echo "=== Files in data/templates/ ===" 
find ./src/mcp_agent/data/templates -type f 2>/dev/null | head -20

echo ""
echo "=== Files in data/examples/ ===" 
find ./src/mcp_agent/data/examples -type f 2>/dev/null | head -20

echo ""
echo "=== Check for mount-point directory ===" 
find ./src/mcp_agent/resources/examples -type d -name "mount-point" 2>/dev/null

Length of output: 2483


🏁 Script executed:

#!/bin/bash
# Check if resources/examples directory exists
echo "=== Check for resources/examples directory ===" 
find ./src/mcp_agent/resources -type d 2>/dev/null | head -20

echo ""
echo "=== Look for mount-point directory anywhere ===" 
find ./src/mcp_agent -type d -name "mount-point" 2>/dev/null

echo ""
echo "=== Count files in resources/examples if it exists ===" 
find ./src/mcp_agent/resources/examples -type f 2>/dev/null | wc -l

Length of output: 153


✗ Critical issue: Non-existent directory patterns in package data configuration.

Lines 119-122 reference resources/examples/ paths that do not exist in the codebase:

  • resources/examples/**/*.py
  • resources/examples/**/*.yaml
  • resources/examples/**/*.yml
  • resources/examples/**/*.csv
  • resources/examples/**/mount-point/*.csv

The resources/ directory does not exist, and neither does the mount-point subdirectory. These glob patterns will silently match zero files during package installation, which means intended package data will not be included. Either:

  1. Create the resources/examples/ directory structure and add the intended files, or
  2. Remove these non-existent path patterns from the configuration

The data/templates/ and data/examples/ patterns are correct and will properly include files.

🤖 Prompt for AI Agents
In pyproject.toml around lines 113 to 122, the package_data globs referencing
resources/examples/**/*.py, .yaml, .yml, .csv and
resources/examples/**/mount-point/*.csv point to a non-existent resources/
directory (and mount-point subdir), so update the file to either create the
resources/examples/ directory tree and add the intended files (including the
mount-point folder and CSVs) or remove those five non-existent glob patterns
from the mcp_agent list; after making the change, verify packaging includes the
expected files by building the package (e.g., python -m build) or listing
included files to confirm the globs match.

Comment on lines 1299 to +1307

override_policy = self._get_configured_retry_policy(activity_name)
if override_policy:
existing_policy = metadata.get("retry_policy") or {}
metadata["retry_policy"] = {**existing_policy, **override_policy}

func.is_workflow_task = True
func.execution_metadata = metadata

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid mutating global task metadata in-place.

metadata here is the shared object stored in GlobalWorkflowTaskRegistry. When you merge overrides into it directly, those changes persist globally. The next MCPApp (or even the next loop iteration) will read the already-mutated dict, so one app’s retry override silently bleeds into every other app that doesn’t set its own override. That turns per-app config into global state and can make activities unexpectedly non-retryable or adopt someone else’s backoff. Please clone the dict (and its retry policy) before merging overrides so each registration gets an isolated copy.

         for target, metadata in registry.get_all_tasks():
             func = unwrap(target)  # underlying function
             activity_name = metadata["activity_name"]
 
             self.logger.debug(f"Registering global workflow task: {activity_name}")
 
+            metadata = {**metadata}
             # Skip if already registered in this app instance
             if activity_name in self._registered_global_workflow_tasks:
                 self.logger.debug(
                     f"Global workflow task {activity_name} already registered, skipping."
                 )
@@
-            override_policy = self._get_configured_retry_policy(activity_name)
+            override_policy = self._get_configured_retry_policy(activity_name)
             if override_policy:
-                existing_policy = metadata.get("retry_policy") or {}
-                metadata["retry_policy"] = {**existing_policy, **override_policy}
+                existing_policy = dict(metadata.get("retry_policy") or {})
+                metadata["retry_policy"] = {**existing_policy, **override_policy}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/mcp_agent/app.py around lines 1299 to 1307, the code currently mutates
the shared metadata object in-place when merging retry overrides; instead,
create an isolated copy of metadata (e.g., new_metadata = dict(metadata) or
deepcopy if nested structures) and copy the existing retry_policy (e.g.,
existing_policy = dict(metadata.get("retry_policy") or {}) or deepcopy) before
merging the override into that copy, then set func.execution_metadata =
new_metadata so the global registry is not mutated.

Comment on lines +1323 to +1335
and current_path
in {
("otel", "exporters"),
("workflow_task_modules",),
}
):
# Concatenate exporters lists from config and secrets
merged[key] = merged[key] + value
# Concatenate list-based settings while preserving order and removing duplicates
combined = merged[key] + value
deduped = []
for item in combined:
if not any(existing == item for existing in deduped):
deduped.append(item)
merged[key] = deduped
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add ("temporal", "workflow_task_modules") to merge paths

The list concatenation logic correctly handles top-level ("workflow_task_modules",) but misses the nested path ("temporal", "workflow_task_modules"). Since TemporalSettings also has a workflow_task_modules field (line 637), both paths should be special-cased for list concatenation.

             elif (
                 key in merged
                 and isinstance(merged[key], list)
                 and isinstance(value, list)
                 and current_path
                 in {
                     ("otel", "exporters"),
                     ("workflow_task_modules",),
+                    ("temporal", "workflow_task_modules"),
                 }
             ):
🤖 Prompt for AI Agents
In src/mcp_agent/config.py around lines 1323 to 1335, the merge logic
special-cases list concatenation for ("workflow_task_modules",) but misses the
nested path ("temporal","workflow_task_modules"); update the conditional to
include ("temporal","workflow_task_modules") so both top-level and
TemporalSettings' workflow_task_modules are concatenated, then deduplicated in
the same way as the existing branch (i.e., treat either path as list-based
concatenation, combine merged[key] + value, remove duplicates preserving order,
and assign back to merged[key]).

@saqadri saqadri merged commit cdfd5f5 into main Oct 29, 2025
8 checks passed
andrew-lastmile pushed a commit to andrew-lastmile/mcp-agent-fork that referenced this pull request Nov 4, 2025
…activity policies (lastmile-ai#595)

* Get agent factory to work with Temporal, and also allow configurable activity policies

* update with PR feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants